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We study the relationship between dynamics and computational capability in Random Boolean 
Networks (RBNs) for Reservoir Computing. Reservoir Computing (RC) is a computational 
paradigm in which a trained readout layer interprets the dynamics of an excitable component called 
the reservoir, which is perturbed by external input. The reservoir is often implemented as a ho- 
mogeneous recurrent neural network, but there has been little investigation into the properties of 
reservoirs that are discrete and heterogeneous. RBNs are a generic and heterogeneous dynamical 
system and here we use them as the reservoir. An RBN is typically a closed system which con- 
sists of a network of A*' nodes with an average in-degree {K), which we extend with an input layer 
that perturbs L nodes. We measure an extended separation property and the fading memory of 
externally perturbed RBNs and show that the optimal balance of these measures are maximized at 
critical dynamics. The computational capability of the network is an interplay between L, (if) and 
the length of the input stream T. We explore the L which adequately distributes input signals into 
the RBN, and find that it is dependent on {K). Finally, we show that under most circumstances, 
near-critical connectivity {Kc) is desirable for reservoirs, but circumstances exist where ordered and 
chaotic networks are viable. These results are relevant to the construction of devices which exploit 
the intrinsic dynamics of complex, heterogeneous systems, such as biomolecular networks. Our find- 
ings underscore the supposition that intrinsic computational capabilities are maximal in substrates 
"at the edge of chaos." 



I. INTRODUCTION 

Reservoir computing is an emerging paradigm that 
promotes computing using the intrinsic dynamics of an 
excitable system called the reservoir . The reservoir 
acts as a temporal kernel function, projecting the input 
stream into a higher dimensional space, thereby creating 
features for the readout layer. To produce the desired 
output, the readout layer performs a dimensionality re- 
duction on the traces of the input signal in the reservoir. 
Two advantages of RC are: computationally inexpensive 
training and flexibihty in reservoir implementation. This 
makes RC suitable for emerging unconventional comput- 
ing paradigms, such as computing with physical phenom- 
ena |2] and self-assembled electronic architectures [3]. 
Maass et al. jl] initially proposed a version of RC called 
Liquid State Machine (LSM) as a model of cortical micro- 
circuits. Independently, Jaeger [5] introduced a variation 
of RC called Echo State Machine (ESM) as an alterna- 
tive recurrent neural network approach for control tasks. 
Variations of both LSM and ESM have been proposed 
for many different machine learning and system control 
tasks (Lukosevicius and Jaeger [I]). Insofar, most of the 
RC research is focused on reservoirs with homogeneous 
in-degrees and transfer functions. However, due to high 
design variation and the lack of control over these de- 
vices, most self-assembled systems are heterogeneous in 
their connectivity and transfer functions. 

Since RC can be used to harness the intrinsic com- 
putational capabilities of physical systems, our study is 
motivated by three fundamental questions about hetero- 



geneous reservoirs: 

1. What is the relationship between the dynamical 
properties of a heterogeneous system and its com- 
putational capability as a reservoir? 

2. How much does a reservoir need to be perturbed 
to adequately distribute the input signal? It may 
be infeasible to perturb the entire system. Also, 
a single-point perturbation may not propagate 
throughout the system due to its internal topol- 
ogy. Thus, we consider the size of the perturbation 
necessary to adequately distribute the input signal. 

3. In a physical RC device, it may be difficult to ob- 
serve the entire system. How much of the system 
and which components ought to be observed to ex- 
tract features about the input stream? 

We model the reservoirs with Random Boolean Net- 
works (RBN), which are chosen due to their heterogene- 
ity, simplicity, and generality. Kauffman p] first intro- 
duced this model to study gene regulatory networks. He 
showed these Boolean networks to be in a complex dy- 
namical phase at "the edge of chaos" when the average 
connectivity (in-degree) of the network is {K) = 2 (criti- 
cal connectivity). Rohlf et al. fT^ showed that with near- 
critical connectivity information propagation in Boolean 
networks becomes independent of system size. Packard 
fS| used an evolutionary algorithm to evolve Cellular Au- 
tomata (CA) for solving computational tasks. He found 
the first evidence that connects critical dynamics and op- 
timal computation in CA. Detailed analysis by Mitchell 
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et al. [5] refuted this idea and accounted genetic drift, 
not the CA dynamics, for the evolutionary behavior of 
the CA. Goudarzi et al. [TU] studied adaptive computa- 
tion and task solving in Boolean networks and found that 
learning drives the network to the critical connectivity 
{K,) = 2. 

Snyder et al. TTl introduced RBNs for RC, and found 
optimal task solving in networks with {K) > {Kc) ■ Here, 
using a less restrictive RC architecture, we find that 
RBNs with critical dynamics provided by (Kc) tend to 
offer higher computational capability than those with or- 
dered or chaotic dynamics. 

To be suitable for computation, a reservoir needs to 
eventually forget past perturbations, while possessing dy- 
namics which respond in different ways due to different 
input streams. The first requirement is captured by fad- 
ing memory. The separation property captures the sec- 
ond requirement and computes a distance measurement 
between the states of two identical reservoirs after be- 
ing perturbed by two distinct input streams. It has been 
hypothesized that computational capabilities are optimal 
when the separation property is highest, but old input is 
eventually forgotten by the reservoir, which occurs when 
fading memory is lowest [12] . We extend the measure- 
ments described in [T2 [13] to predict the computational 
capability of a reservoir in finite time-scales with a short- 
term memory requirement. 

II. MODEL 

A Reservoir Computing device is made up of three 
parts: input layer, reservoir, and readout layer (See [l] 
The input layer excites the reservoir by passing an input 
signal to it, and the readout layer interprets the traces 
of the input signal in the reservoir dynamics to compute 
the desired output. In our model, the reservoir is a Ran- 
dom Boolean Network (RBN). The fundamental subunit 
of an RBN is a node with K input connections. At any 
instant in time, the node can assume either of the two 
binary states, "0" or "1." The node updates its state 
at time t according to a K-to-1 Boolean mapping of its 
K inputs. Therefore, the state of a single node at time 
t + 1 is completely determined by its K inputs at time 
t and by one of the 2^ Boolean functions used by the 
node. An RBN is a collection of N such binary nodes. 
For each node i out of N nodes, the node receives Ki in- 
puts, each of which is connected to one of the N nodes in 
the network. In this model, self-connections are allowed. 

The network is random in two different ways: 1) 
the source nodes for an input are chosen from the 
N nodes in the network with uniform probability and 
2) the Boolean function of node i is chosen from the 
2^ ' possibilities with uniform probability. Each node 
sends the same value on all of its output connections 
to the destination nodes. The average connectivity will 
be {K) = j^^f^iKi. We study the properties of 
RBNs characterized by N nodes and average connectiv- 



ity (K). This refers to all the instantiations of such 
RBNs. Once the network is instantiated, the collec- 
tive time evolution at time t can be described as using 
xl^^ — fi{x{,X2, ■ ■ ■ ,x*jf,), where x- is the state of the 
node i at time t and /, is the Boolean function that gov- 
erns the state update of the node i. The nodes are up- 
dated synchronously, i.e., all the nodes update their state 
according to a single global clock signal. 

From a graph-theoretical perspective, an RBN is a di- 
rected graph with N vertices and E = 1{K)N\ directed 
edges. We construct the graph according to the random 
graph model [14] . We call this model a heterogeneous 
RBN because each node has a different in-degree. In 
the classical RBN model, all the nodes have identical 
in-degrees and therefore are homogeneous. The original 
model of Kauffman |6] assumes a static environment and 
therefore does not include exogenous inputs to the net- 
work. To use RBNs as the reservoir, we introduced / ad- 
ditional input nodes that each distribute the input signals 
to L randomly picked nodes in the network. The source 
nodes of Ki links for each node i are randomly picked 
from N nodes with uniform probability. The input nodes 
are not counted in calculating (K). For online computa- 
tion, the reservoir is extended by a separate readout layer 
with O nodes. Each node in the readout layer is con- 
nected to each node in the reservoir. The output of node 
o in the readout layer at time t is denoted by y* and is 

computed according to ~ sign (^f^i ctjX^j + . Pa- 
rameters aj are the weights on the inputs from node j in 
the reservoir to node o in the readout layer, and b is the 
common bias for all the readout nodes. Parameters aj 
and b can be trained using any regression algorithm to 
compute a target output [5]. In this paper, we are con- 
cerned with RBN-RC devices with a single input node, 
and a single output node. 



III. MEASURES 
A. Perturbation Spreading 

RBNs are typically studied as closed systems in which 
the notion of damage spreading is used to classify the 
RBNs' dynamics as ordered, critical, or chaotic [TS]. Be- 
cause our model requires external perturbations, we must 
extend the notion of damage spreading to account for 
RBNs which are continuously excited by external input. 
Since an RBN used as a reservoir is not a closed system, 
the propagation of external perturbations may behave 
distinctly from the propagation of damage in the initial 
states of the RBN. Let M be an RBN with N nodes and 
average connectivity (K). Let Ua be an input stream, and 
Ub be a variation of Ua- Then the perturbance spread- 
ing of M with an input stream Ua and its variation uj, 
is Perturb{Ai,Ua,uii) — — ^ — where A and B are the 
states of the RBN after being driven by input streams Ua 
and Uh respectively, and H(A, B) is the Hamming dis- 
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input layer 



reservoir readout layer 




FIG. 1. Schematic of a reservoir computing system. The 
input layer delivers the input signals to random nodes in- 
side the reservoir. The readout layer receives output signals 
from random nodes inside the reservoir. The reservoir itself 
is made of a collection of computing nodes that are randomly 
interconnected. The reservoir creates a representation of the 
input signals that can be read and classified by the readout 
layer. Learning is performed by training only the readout 
layer nodes and connections. 



tance between the states. 

For a dynamical system to act as a reservoir, it needs 
to be excited in different ways by very different input 
streams, while eventually forgetting past perturbations. 
These measurements are captured by the notions of sep- 
aration and fading memory in |12| . However, to account 
for the importance of short-term memory in the reservoir 
and finite-length input streams, we are specifically inter- 
ested in the separation of the system r time steps in the 
past, within an input stream of length T. 

The ability of the RBN to separate two input streams 
of length T, which differ for only the first T — t time 
steps, is given by 

sepr{M,T) Perturb{M,u,v), (1) 

where T = \u\ = \v\ and 



iNOT{u,), 



lii<T -T 

otherwise. 



In order for an RC device to be able to generalize, a 
reservoir needs to eventually forget past perturbations. 
Thus we define 



fade{M,T) = Perturh{M,u,w), 
where 7" = \u\ = \w\ and 



(2) 



Wi 



{NOT{ui), ifi = 
I Mi, otherwise. 



Natschlager et al. [12] found that computational ca- 
pability of recurrent neural network reservoirs are great- 
est when the difference between separation and fading 



memory are largest and that this coincides with critical 
dynamics. Therefore, we want fading memory to be low, 
while separation is high. We define the computational ca- 
pability of a reservoir M over an input stream of length 
T , T time steps in the past as 



A(X,r,r) = sepr{M,T) ~ fade{M,T.\ 



B. Entropy and Mutual Information 



(3) 



Information theory [16' provides a generic framework 
for measuring information transfer, noise, and loss be- 
tween a source and a destination. The fundamental quan- 
tity in information theory is Shannon information defined 
as the entropy "Hs of an information source S. For a 
source S that takes a state {si\l < i < with probabil- 
ity p{si), the entropy is defined as: 



"Hs = - ^p{s^)\og2p{s^). 



(4) 



i=l 



This is the amount of information that S contains. To 
measure how much information is transferred between a 
source and a destination, we calculate the mutual infor- 
mation A'II{S : D) between a source S and a destination 
D with states dj. Before we can calculate MI{S : D) 
we need to calculate a joint entropy of the source and 
destination as follows: 

n m 

HsD = - ^^p{s^,dj)\og2p{s^,dj). (5) 

i=i j=i 



Now the mutual information is given by: 

MiiS:D)^ns + nD-nsD- 



(6) 



We will see later how we can use entropy and mutual 
information to see how much information from the input 
signals are transferred to the reservoir and how much 
information the reservoir can provide about the output 
while it is performing computation. 



C. Tasks 

We use the temporal parity and density classification 
tasks to test the performance of the reservoir systems. 
According to the task, the RC system is trained to con- 
tinuously evaluate n bits which were injected into the 
reservoir beginning at t -I- n time steps in the past. 



1. Temporal Parity 

The task determines if n bits t + n to t time steps in 
the past have an odd number of "1" values. Given an 
input stream u, where |m| = T, a delay t, and a window 
n > 1, 
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u{t — r), if n = 1 

0"Zo^u(t — T — i), otherwise, 



where T + n<t<T — t — n. 



2. Temporal Density 

The task determines whether or not an odd number 
of bits r + n to r time steps in the past have more "1" 
values than "0." Given an input stream u, where |m| = T, 
a delay r, and a window n — 2k + I, where fc > 1, 



\ 1, if 2^u(t-r-i) > 



1=0 

0, otherwise, 



where 



n<t<T- 



3. Training and Evaluation 

For every system, we randomly generate a training set 
St and testing set Sq- For each stream v St or u Sq, 
\v\ — \u\ — T- The size of the training and testing sets 
are dependent on n, and determined by the following 
table. 



n 


\St\ 




1 


50 


50 


3 


150 


150 


5 


300 


150 


7 


500 


150 


9 


500 


150 



We train the output node with a form of stochastic 
gradient descent in which the weights of the incoming 
connections are adjusted after every time step in each 
training example. Given our system and tasks, this form 
of gradient descent appears to yield better training and 
testing accuracies than the conventional forms. We use 
a learning rate rj = 0.01, and train the weights for up 
to 20,000 epochs. Since the dynamics of the underly- 
ing RBN are deterministic and reset after each training 
stream, we terminate training early if an accuracy of 1.0 
is achieved on St- The accuracy of an RC device on a 
stream v G St is determined by the number of times 
that the output matches the expected output as speci- 
fied by the task divided by the total number of values 
in the output stream. The accuracy on each input set 
is summed together and divided by the total number of 
input streams in the set to calculate the current training 
accuracy T. After the weights of the output layer are 
trained on the input streams in St^ they remain fixed. 
We then drive the reservoirs with input streams u e Sq 
and record the number of times that the output of the 





(a) {K) = 1, r = 10 



(b) {K) = 1, r = 100 
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(c) {K) = 2, r = 10 



(d) {K) = 2, r = 100 





(e) {K) = 3, r = 10 



(f) {K) = 3, r = 100 



FIG. 2. The computational capability A of RBN reservoirs 
with A'^ = 500 L G [1,A^] and r G [1,9] varies according to 
{K) G {1, 2, 3} and T = 10 in the left column and T = 100 
in the right column. 



RC device matches the expected output. The generaliza- 
tion capability G is then computed by dividing the total 
number of times in which the output of the readout layer 
matches the correct output, by the total number of cor- 
rect outputs. This process is averaged over all streams in 
Sq. In general, we are interested in finding the reservoirs 
that maximize G. 



IV. RESULTS 

A. Computational Capability 

The computational capability as predicted by A are de- 
pendent on the properties of the reservoir Ai , the length 
of the input stream T, and the memory r required by the 
reservoir. The properties of Ai are determined by the dy- 
namics which are due primarily to {K) and the number 
of nodes L which the input directly perturbs. For each 
L, (K), T, and r we calculate the average A(A^,T, r) of 
50 instantiations oi A4. In figure [2] we present these re- 
sults for (K) e {1,2,3}. To produce figure [s] we sum over 
A{A4, T,t) for all L. The dashed curves in figures 3(a) 



and 3(a) are spline fits which highlight the greatest A 
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(a) A(A^,T, t), where (b) ^ A(X, T, r), where 



r = 10 



r = 100 



FIG. 3. The computational capability A of RBN reservoirs 
M with iV = 500 and (K) € (0, 4] summed over L G (0, 500]. 
The dashed curve is a spline fit to the highest A, illustrat- 
ing that near-critical connectivity majcimizes computational 
capability, particularly in high T. 



values. In figures [2] and [3] we see that RBNs with criti- 
cal connectivity (Kc) = 2 tend to provide the highest A. 
A high A signifies that the reservoir dynamics have the 
ability to separate different input streams, while having 
dynamics which are determined more by recent input, 
than past input. In contrast, a low A signifies either or 
both of the following: i) the reservoir's dynamics are too 
frozen to separate different input streams effectively or ii) 
traces of early perturbations are never forgotten by the 
reservoir. The consequence of i) is the inability to com- 
pute difficult tasks, such as PARn or those which require 
long short-term memory. The consequence of ii) is great 
difficulty in generalizing; if past information which is ir- 
relevant to computing the correct output in the readout 
layer dominates the dynamics of the reservoir, the output 
layer will be unable to classify the dynamics caused by 
the more relevant, recent input. 

We see that (K) = 1 has a very high A only when r is 
small. This is due to the brief short-memory afforded to 
an RBN with subcritical dynamics. Since a network with 
(K) — 1 has little short-term memory, its computational 
capabilities are unaffected by an increase in T, as demon- 
there is no memory at 



strated in figures [2(a) and 2(b) 



all of early perturbations. 

Chaotic reservoirs, represented here by (K) = 3, are 
characterized by their sensitivity to initial perturbations, 
and a high separation. In two identical, chaotic systems, 
a single bit difference in their respective input streams 
will eventually become magnified until the two systems 
differ by the states of some P nodes. If the initial per- 
turbation is larger than P, then the differences in the 
systems will diminish until reaching P. Because of this, 
a chaotic system could maximize its A in two different 
ways: i) compute over a sufficiently short input stream 
and ii) perturb enough of the system so that the recent 
input has a more significant effect on the dynamics than 
the past input. In i), the restriction of a brief input 
stream can be relaxed if the input stream perturbs as 
few nodes as possible, giving the system more time to 
propagate perturbations (see 2(e) I. On the other hand 



ii) requires maximizing L (see |2(f)"| ). However, even if 
distortion is staved off by slowing down the propagation 
of external perturbations, the system is ultimately fated 
to disorder. 



B. Information and Optimal Perturbation 

In the traditional implementations of reservoir com- 
puting, all the nodes in the reservoir are connected to the 
source of the input signal. Many task specific and generic 
measures of computation in reservoirs have been compre- 
hensively studied in fTS". However, the relationship be- 
tween the computational properties of the reservoir and 
the number of nodes which the input layer perturbs re- 
mains unexplored. Here, we use information theory to 
characterize the computation in the reservoir as infor- 
mation transfer between the input and the reservoir and 
between the reservoir and the output. 

In reservoir computing, the reservoir is a dynamical 
system and therefore has intrinsic entropy. The input is 
also time-varying and we can calculate its entropy. In or- 
der to reconstruct the desired function, the output layer 
has to pick up the traces of the input in the reservoir dy- 
namics. This fact is reflected in the entropy change of the 
reservoir due to its input and therefore can be measured 
using mutual information between the input / and the 
reservoir R, i.e., MI{I : R). In our study we distribute 
the input to the reservoir only sparsely, we would thus 
like to find how MI{I : R) changes as a function of L 
and if there is an optimal L. Moreover we would like to 
know, given a task to be solved, how much information 
the reservoir can provide to the output. That is, given 
the desired output, can the reservoir state be predictive 
of the output? This is equivalent to determining how 
much information is transferred from the reservoir to the 
desired output. We show this measure using MI{R : O) 
where O indicates the output as the target. 

In order to calculate MI {I : R) and MI{R : O), we 
consider the instantaneous states of the reservoir and its 
output to calculate the entropy. For the input, we need to 
calculate the entropy over the states that the input can 
take over the window size n. For example, on a input 
stream of length T = 50 bits, window size n, and time 
delay t — 1, the input pattern m" is an n-bit long moving 
window over the stream, starting at time step t = 0, i.e., 
{u"|0 < i < T — T — n}. To calculate the entropy of 
the reservoir we consider the collection of instantaneous 
reservoir states Sj at time step j, i.e., {sj\T-\-n < j < T}. 
The output pattern is calculated using the output of the 
DNSn{t) task. A subtlety arises while calculating the 
reservoir entropy; since the reservoir follows determinis- 
tic dynamics, if L = 0, where the input signals does not 
perturb the reservoir, then the reservoir dynamics will 
be identical when one repeats the experiment. For reser- 
voirs with chaotic dynamics, where each Sj is unique, we 
have a many-to-one mapping between the reservoir states 
and the output patterns and therefore the reservoir states 
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appear to be capable of reconstructing the output com- 
pletely. To get the correct result, one must calculate the 
entropy over many streams. In this case, since the cor- 
responding output patterns change every time, the map- 
ping between the reservoir state and the output will not 
appear predictive of the output. We feed the reservoir 
with 50 randomly chosen input sequences of length 50. 
The entropies Hi, Hr, and Ho are calculated using the 
states the input, the reservoir, and the desired output 
takes during this 50 time interval. Note there is no need 
to have an output layer in these experiments and the 
calculations are independent of training mechanism. 

Figure ID illustrates MI {I : R), and MI{R : O) as 
a function of L for reservoirs of {K) G {1,2,3}. For 
comparison, we have also included Hi, Hr, and Ho- 
In an ideal reservoir in which the reservoir contains all 
the information from the input MI {I : R) = Hi, and 
MI{R : O) = Ho, indicating that the reservoir con- 
tains the required information to reconstruct the desired 
output perfectly. For (K) = 1 we see that growing L 
increases MI {I : R) and MI{R : O) to a maximum level 
below the ideal values even for L — 500, where all the 
nodes in the reservoirs are receiving the input. These 
systems do not have enough capacity to calculate the 
desired output perfectly. For {K) = 2, we see that mu- 
tual information increases and reaches the ideal level at 
L = 20. In these systems, the sparse connectivity be- 
tween input and reservoir is enough to provide all the 
required information about the input to the reservoir. 
We see that at the same level of L the reservoir dynamics 
are completely predictive of the output. For {K) — 3, the 
intrinsic dynamics of the reservoir are very rich (super- 
critical dynamics) and the mutual information between 
the reservoir and output reaches its peak at L = 5. In 
these systems little perturbation quickly spreads. The 
reservoir at this perturbation level will have enough in- 
formation to reconstruct the output. 



C. Task Solving 

We calculate the generalization capability G of RBN- 
RC devices with {K) e {1,2,3}, N = 500, and L e 
(0, 500] on the DNSn and PARn tasks with random in- 
put streams of length T G {10, 100}. For each set of pa- 
rameters, we instantiate, train, and test 30 RC devices. 
Figures [5] and [6] present cubic spline fits to the average of 
these results. We observe in figures [5] and |6] that critical 
dynamics provide the most robust generalization capa- 
bility in task solving. Ordered and chaotic reservoirs can 
evidently solve tasks under certain circumstances. How- 
ever, the ordered networks are limited by little short-term 
memory, while the chaotic networks accumulate extra- 
neous information from past perturbations and demon- 
strate reduced performance as the length of the input 
stream increases. 
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FIG. 4. Mutual information between the input and reservoir 
MI {I : R) and reservoir and output MI{R : O). We have 
also included the intrinsic information in the input stream, 
reservoir, and output. For all of these cases r = 1. In an 
ideal reservoir MI{I : R) ^ Hi and MI{R : O) ^ Ho- For 
(K) = 1.0, as L grows both MI{I : R) and MI{R : O) 
grow to a maximum level below the ideal level. The reservoir 
in these systems does not carry enough information for the 
output layer to solve the task. For {K) — 2.0, as L grows, 
the mutual information grows and reaches the sufficient level 
a,t L — 20. For (K) = 3.0, mutual information peaks near 
L = 5. Small perturbation from the input provide enough 
information to the reservoir to reconstruct the desired output. 



1. Average Reservoir Indegree (K) — 1 

When n of PARn and DNSn is small, there is very 
little memory and processing required by the reservoir, 
and so RC devices in which the reservoir has (K) = 1 can 
achieve per fect g ener alizat ion G for DNS3 and PAR3 
(see figures 5(a) and 6(a)[ ). However, ordered networks 
are dominated by fading memory, hence the dynamics 
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FIG. 5. The generalization capability G of an RBN A4 on 
the task DNSn is dependent on L and (K), as well as task 
parameters n and T. Notably, chaotic networks achieve their 
maximum generalization capability with a lower L than or- 
dered networks. Ordered networks possess little memory and 
so their performance drops as n increases. On the other hand, 
chaotic networks perform poorly with T — 100 as opposed to 
T = 10, due to an inadequately fading memory. 



FIG. 6. The generalization capability G of an RBN-RC device 
A4 on the task PAR„ is dependent on L and (K), as well as 
task parameters n and T. Notably, chaotic networks achieve 
their maximum generalization capability with a lower L than 
ordered networks. Ordered networks possess little memory 
and so their performance drops as n increases. On the other 
hand, chaotic networks perform poorly with T — 100 as op- 
posed to T = 10, due to an inadequately fading memory. 



2. Average Reservoir Indegree (K) — 3 



do not retain enough information about past perturba- 
tions to achieve high accuracy when n increases. Since 
the dynamics of ordered networks are only determined 
by their most recent perturbations, the length of the in- 
put stream T is irrelevant for the task solving capability, 
which explains why the generalization G of ordered net- 
works computing DNS3 is very similar when T — 10 and 
T = 100, as seen in figures 5 (a) [ and 5(b) respectively. 



Since memory fades quickly, in an ordered reservoir, in- 
put cannot propagate swiftly through the network. More- 
over, a (K) = 1 network will almost certainly possess 
islands. These islands will be unreachable by an input 
stream that does not strongly perturb the system. In ad- 
dition, figure [4(b)] demonstrates that an ordered network 
with (K) — 1 increases its mutual information between 
input and reservoir MI {I : i?) as L increases from to 
A^. Because of this, we see in figures [5] and [6] that in- 
creasing L tends to result in higher accuracy on DNSn 
and PARn respectively. Therefore, an increase of L can 
only increase the performance of the reservoir. 



Chaotic reservoirs, represented here by (K) — 3, are 
dominated by the separation property. As a result, the 
G of the chaotic networks are least affected by increasing 
the window n. However, high performance is only pos- 
sible when the input stream is sufficiently small, such as 
T = 10. In figures |5(d)| and 6(a) the stream length is 
T = 10 and the G of the chaotic network is high. How- 
ever, when the length of the input stream is increased to 
T — 100, the performance drops significantly, even while 
the performance of networks with lower connectivities 

id[5(b)]). 



remain relatively unchanged (see figures |5(a' 



anc 



Though no longer relevant to the RC device, early pertur- 
bations have a significant effect on the dynamics, which 
makes it difficult for the output layer to extract informa- 
tion about the more recent input. On the other hand, if 
the input stream is sufficiently short, chaotic reservoirs 
have less time to be distorted by early input. 

In a network which is chaotic due to high connectivity, 
there will be fewer and larger connected components than 
in those which are less densely connected [Mj. Therefore, 
the minimal L needed to adequately distribute the input 
signal in a (K) = 3 network is less than the other con- 
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nectivities explored here (See figure [4(T)| ). Also, a chaotic 
reservoir can effectively increase its computational capa- 
bility as predicted by A by reducing its L, which increases 
the time it takes for perturbations to spread through the 
system (see figure 2(e)). Evidently, the chaotic system 
uses this strategy in computing DNSn when T — 10, as 
seen in figure [5] However, this behavior is not observed 
for PARn- The parity task is highly non-linear. We 
speculate that, due to the complexity of the task, sepa- 
ration capability is more significant than it is in DNS; 
this causes a strategy which maximizes the separation 
property by increasing L to be optimal. 



3. Average Reservoir Indegree (K) — 2 

As observed in figure [Sj the difference between the sep- 
aration property and fading memory tends to be maxi- 
mized near-critical connectivity (K) = 2. This is evi- 
dent in our task solving results: when the n of DNSn 
and PARn increases, these systems do not show the dra- 
matic drop in G that the ordered systems do (see figuresjs] 
and [6]). Simultaneously, the G of these systems are unaf- 
fected by an increase in the stream length T, in contrast 
to chaotic networks. In figure [4(d)] we observe that with 
L < 20 the input signal cannot adequately propagate the 
input signal, which is demonstrated by a lower G for very 
small L in figures [5] and |6] However, increasing L in task 
solving appears to afford more of a benefit than simply 
increasing the information about the input stream. In 
both figures [5] and |6] we see that the best G of the critical 
networks occurs after the system has already achieved 
maximal MI {I : R) between input stream and reservoir 
dynamics. 



V. SUMMARY AND DISCUSSION 

We investigated the computational capabilities of Ran- 
dom Boolean Networks when used as the dynamical com- 
ponent in Reservoir Computing devices. We found that 
computation tends to be maximized at the critical con- 
nectivity (Kc) = 2. However, in RC, the reservoir is 
continuously perturbed, and both the size of the pertur- 
bations as well as the length of time that the reservoir 
is perturbed for must be taken into account, along with 
the chaoticity of the dynamics. If the input stream is 
sufficiently short, then chaotic systems can still perform 
quite well, but as the length if the input stream increases, 
these networks can no longer differentiate and generalize 
on subsets of the input stream, as the past perturbations, 
which may no longer be relevant to the computation, are 
dominating the dynamics. On the other hand, ordered 



networks can perform well, independent of the length of 
the input stream, as long as the window of computation 
is sufficiently small, as an ordered system retains little 
information about perturbations in the past. 

A network view of the RC device can also give us more 
insight as to why connectivity influences performance. If 
the reservoir acts on the input stream as a set of spa- 
tiotemporal kernels, a suitable reservoir needs to include 
a diverse set of kernels. In [TU], we saw that at the connec- 
tivity (Kc) = 2 the network shows maximal topological 
diversity and dynamics. A reservoir with connectivity 
(Kc) — 2 therefore can act as many networks of the same 
connectivity, each acting as different kernel. 

In |12j and |13j it was shown that optimal computation 
occurs in recurrent neural networks at the critical points, 
and our results provide an additional example of this, in 
a binary, heterogeneous reservoir. In RC, we continu- 
ously perturb the reservoir and so the underlying RBN 
of our model is not a closed system. Therefore, compu- 
tation cannot be dependent on attractors and must be 
enabled entirely by the dynamics of the RBN. In some 
instances, the network dynamics can fall into an attractor 
temporarily or indefinitely, due to frozen dynamics, inad- 
equate distribution of the input signal, or a non-random 
input stream. Therefore, RC is a novel framework in 
which to explore the capacity of RBN dynamics for infor- 
mation processing. RBNs have been studied under other 
task solving scenarios; in Goudarzi et al. |10| networks 
evolve towards criticality, although computation is still 
performed by attractors. Our study shows that unlike 
the findings in [5] , for RBNs there is a strong connection 
between computation and dynamics, and optimality of 
the computation is evidently due to critical dynamics in 
the network. Despite the differences between externally 
perturbed RBNs in RC and RBNs explored as a closed 
system, we nevertheless observe that critical RBNs are 
indeed optimal for reservoir computing. 

Our conclusion provides an intriguing link between 
these disparate usages of RBN. By providing evidence 
that critical dynamics are desirable for heterogeneous 
substrates in RC, our findings may be relevant to the 
development of devices which exploit the intrinsic infor- 
mation processing capabilities of heterogeneous, physical 
systems such as biomolccular or nanoscale device net- 
works. 
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